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SUMMARY 


PROBLEM:  In  responding  to  Items  of  psychological  tests,  subjects' 

answers  are  Influenced  by  the  form  In  which  responses  are  pres-rted.  For 
example,  personality  test  items  customarily  present  a statement  of  behavior 
or  feeling  followed  by  response  alternatives  of  "yes,”  >r  "to."  The 

tendency  to  answer  to  personality  items  may  be  indicative  of  a person- 
ality trait  itself,  aside  from  the  traits  which  the  test  was  designed  to 
measure.  This  study  invest igatoa  the  response  with  respect  to  its  re- 
liability, and  relationship  to  personality  and  intelligence. 

SUBJECTS  AND  PROCEDURE:  Three  Guilford -Marti*  inventories  were 

administered  to  }44  Naval  Aviation  Cadets.  The  Guilford -Mart In  tests  pur- 
portedly measure  lj,  personality  traits:  GAMIN,  STDCR,  and  OAgCo  respective- 

ly. Bernreuter  tests  were  also  available  for  277  of  these  subjects.  Thir- 
teen Guilfor  ‘.-Martin  trait  scores,  ACE  test  scores,  and  years  of  schooling 
completed  were  used  as  independent  variables.  Four  "?"  scores  were  obtained 
by  summing  the  number  of  responses  on  each  personality  test.  Inter- 
correlations were  computed  for  these  four  scores,  and  correlations  between 
selected  combinations  of  scores.  From  this  analysis,  Bernreuter  scores 
and  the  sum  of  "?"  scores  from  three  Guilford -Mart  in  tests  were  defined  as 
dependent  variables.  These  two  scores  were  then  correlated  with  Guilford - 
Martin  trait  scores,  ACE  scores,  and  years  of  schooling  completed. 

RESULTS:  (l)  The  mean  number  of  "t,s"  was  equal  to  about  eight 

percent  of  the  number  of  personality  items. 

(2)  The  distribution  of  "?"  scores  approximated  a J- 
curve,  one-half  of  the  normal  distribution., 

(5)  The  reliability  for  Bernreuter  "?"  scores  was  esti- 
mated as  .64;  for  a single  Guilford -Martin  test  as  .72;  for  the  sum  from 
three  Guilford -Mart in  inventories  in  the  low  eighties. 

(4)  Question  mark  scores  were  independent  of  ACE  and 
educational  level  attained  within  the  restricted  range  of  scholastic  apti- 
tude studied. 


(5)  Bernreuter  and  Guilford -Martin  "?"  scores  corre- 
lated significantly,  and  about  equally,  with  ten  Guilford -Mart  in  traits: 
GAMIN,  OAgCo,  and  TR  of  test  STDCR.  These  correlations  were  judged  spurious, 
however,  because  of  the  method  of  scoring  "T's"  utilized  in  Guilford -Mart in 

tests. 

CONCLUSIONS:  Subjects  who  frequently  use  the  "T"  response  to 

personality  items  of  one  test  tend  to  use  it  often  on  other  similar  tests. 
Thus,  the  number  of  "7"  responses  is  a measure  of  a reliable  trait.  The 
subject's  intelligence  does  not  appear  to  influence  the  number  of 
responses  be  will  use.  The  tendency  to  use  the  category  was  found  to 
be  correlated  with  many  specific  personality  traits.  These  correlations 
were  interpreted  as  an  artifact  of  the  method  of  scoring  personality  tests. 
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rather  than  representing  the  psychological  correlates  of  people  vho  respond 
by  Independent  measures  of  personality  are  needed  to  find  such  corre- 
lates of  "T"  responses. 


i.  iirmopucrioH 

A.  History  of  the  Problem 

Many  investigators  have  been  concerned  directly  or  indirectly  with 
response  sets.  Cronbach  (4)  defines  response  set  as  coiy  tendency  causing 
a person  consistently  to  give  different  responses  to  teat  items  than  he 
would  when  the  same  content  is  presented  in  a different  fora."  For  example, 
observations  of  variability  in  such  factors  as  speed  and  accuracy  on  ob- 
jective tests  and  productivity  on  essay  tests  are  commonplace.  This  report 
presents  preliminary  findings  concerning  a long  discussed  problem  for  vhlch 
little  empirical  evidence  has  been  available. 

In  its  broader  dimensions,  response  sets  have  been  discussed  in 
relation  to  topics  varying  from  constant  errors  in  psychophysics  (4)  to  work 
simplification  by  means  of  time  and  motion  studies  (14).  The  use  of  the 
middle  category,  one  type  of  response  set,  dates  back  to  Tltchener  who  was 
concerned  with  equality  Judgments  in  psychophysical  research  (18).  A typical 
experiment  in  psychophysical  research  consists  of  having  subjects  Judge 
whether  one  weight  is  heavier,  lighter,  or  equal  in  weight  to  smother. 
Tltchener  discussed  the  issue  of  retaining  the  middle  category  (e.g.  equal 
in  weight)  versus  its  elimination,  without  clearly  resolving  it.  In  1907, 
Angell  (2)  noted  that  subjects  exhibited  differences  in  their  use  of  equality 
Judgments,  and  he  felt  "that  this  difference  corresponds  to  the  difference 
between  deliberate  and  impulsive  temperaments."  Fernberger  (6)  found  that 
different  instructions  produce  significantly  dll'erent  numbers  of  equality 
Judgments  in  psychophysical  experiments.  Based  on  this  finding  and  the 
results  of  previous  research,  he  concluded  that  equality  Judgments  are  de- 
pendent upon  form  of  instructions,  subjects’  attitudes,  and  their  basic 
temperament.  He  felt  it  desirable  to  retain  the  psychophysical  method  which 
utilized  a middle  category.  Woodworth  ( 19 ) reviewed  the  debate  about  middle 
category  retention  versus  elimination  and  concluded  that  from  the  laboratory 
point  of  view  there  was  no  real  basis  for  favoring  one  method  over  the  other. 


B.  Recent  Research  Relating  to  the  Problem 

In  selecting  one  of  the  response  categories  of  objective-type 
questionnaires,  a subject  is  confronted  with  what  is  analagous  to  a psycho- 
physical Judgment.  In  addition  to  this,  there  also  exists  a semantic  prob- 
lem, as  shown  in  a study  by  Hosier  (9).  He  concluded  that  there  were  re- 
liable differences  in  subjects'  interpretation  of  words  commonly  used  in 
interest,  attitude,  and  personality  tests.  The  meanings  assigned  by  students 
to  such  words  as  "frequently,"  "indifferent,"  and  "desirable"  differed 
significantly.  He  found  that  students  preferred  "good"  to  "better,"  and 
"bad"  to  "worse,"  as  shown  by  their  ratings  of  these  words. 
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Some  of  the  research  in  this  area  relates  to  the  general  problem 
of  the  number  and  nature  of  response  categories  to  be  used.  Osgood  (10) 
showed  that  on  a seven  point  scale  some  subjects  predominantly  used  1 and  7, 
some  used  1,  4,  and  7>  while  some  used  the  whole  scale.  Remmers  and  Sageser 
(11)  showed  that  within  limits  the  more  choices  on  a multiple -choice  atti- 
tude scale  the  greater  the  test  reliability. 

,,  In  an  article  concerned  with  the  effect  of  response  sets  on  re- 
liability and  validity,  Cronbach  (4)  listed  the  following  response  sets: 

(l)  Tendency  to  gamble;  (2)  definition  of  Judgment  categories;  (5)  in- 
clusiveness of  response;  (k)  bias,  acquiescence,  tendency  to  agree;  (|>) 
speed  versus  accuracy.  He  presented  evidence  frosi  the  literature  that  each 
of  these  types  of  response  sets  was  a reliable  trait  in  certain  test  situ- 
ations. He  felt  that  the  existence  of  still  other  response  sets  was  likely. 

Most  of  the  studies  in  this  area  have  aimed  at  establishing  one 
or  another  of  the  response  sets  as  a stable  trait  or  factor  which  could  be 
measured  reliably.  In  one  of  the  first  and  more  extensive  studies  along 
this  line,  Lorge  (8)  found  positive  correlations  between  corresponding  answer 
categories  of  the  Bernreuter,  the  Strong  Vocational  Interest  Blank,  and  a 
Thorndike  and  a Thurstons  attitude  scale.  For  example,  he  found  & positive 
correlation  between  the  number  of  "T"  responses  on  the  Bernreuter,  the 
number  of  "?"  responses  on  the  Thurstone,  the  number  of  nIM  responses  on 
the  Strong,  and  the  number  of  ”5”  responses  on  the  Thorndike.  The  corre- 
lation coefficients  were  not  reported. 

Lorge  inferred  from  his  findings  (l)  that  the  method  of  rating 
items  introduced  a special  effect  which  be  considered  a halo  effect;  (2) 

"that  the  tendency  to  respond  by  ’yes’s,*  ’no’s,*  * ?’s,*  or  similar  rubrics 
may  be  symptomatic  of  a special  aspect  of  personality."  The  first  inference 
implies  that  the  desired  approach  to  the  problem  is  elimination  or  control 
of  response  set  through  improved  test  construction.  This  conclusion  has 
been  reached  in  most  of  the  subsequent  studies.  Lorge’ s second  Inference 
Implies  that  measures  of  response  sets  may  represent  measures  of  personality. 
There  have  been  few  empirical  studies  relating  to  this  inference. 

Argument  for  trying  to  eliminate  or  control  response-set  vari- 
ability is  engendered  by  evidence  demonstrating  the  effects  of  this  source 
of  variation  on  test  reliability  and  test  validity.  Lent*  (7)  reported  that 
acquiescence,  or  tendency  to  agree,  was  a potent  factor  in  lowering  re- 
liability of  personality  measurements  and  pointed  out  the  aeed  for  con- 
trolling this  factor.  Cronbach  (5)  investigated  the  factor  of  acquiescence 
and  Its  effect  on  the  reliability  and  validity  of  a series  of  true-false 
tests.  The  acquiescence  factor  had  test-test  reliability  coefficients  which 
ranged  from  .Jo  to  .61,  all  of  which  were  significant  at  the  .01  level  of 
confidence.  The  reliability  coefficients  of  "false"  scores  (based  on  item* 
marked  false)  were  generally  greeter  than  those  for  "true"  scores  (based  on 
items  marked  true).  In  three  cases  out  of  a total  of  ten,  C.  R.'s  between 
corresponding  reliabilities  for  "t^ue"  and  "false”  scores  were  significant. 
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Us lag  courM  grads • derived  fro*  tssts  other  th&a  true-false  as  a cr iter toe, 
the  folloviag  validity  coeff le teats  were  reported  for  the  "true,"  "false," 
aad  total  scores  of  tvo  true-false  teste: 

"True"  "False"  Total 

Test  1 .222  .6 66  .6?0 

Test  2 .319  -700  .598 

Cronbach's  study  preseats  cogent  evidence  for  the  existence  of  the  acqui- 
escence factor  and  its  effects  on  the  reliability  ana  validity  of  true-false 
tests. 


Rundquist  (13)  shoved  that  fore  of  sta tenant,  a correlative  of 
response  set,  affected  the  validity  of  Items  on  a personality  test  of  the 
personal  inventory  type.  Scores  on  "acceptable"  (positively  state  iteas) 
were  less  valid  than  scores  on  "unacceptable"  (negatively)  stated  items. 

After  reviewing  numerous  studies  demonstrating  various  response 
sets,  Cronbach  (3)  stated  that  response  sets  generally  tend  "to  reduce  the 
saturation  of  a test  and  to  limit  its  possible  validity."  He  recommended 
that  response  sets  be  avoided  " with  the  occasional  exception  of  sons  tests 
measuring  carefulness  or  other  personality  traits  vhlch  are  psychologically 
similar  to  response  sets."  Whether  a response  set  and  a personality  trait 
are  "psychologically  similar"  is  a matter  of  hypothesis,  however.  Empirical 
evidence  for  the  relationship  betveen  response  set  and  personality  is  rare. 
Tvo  studies  vhlch  move  in  this  direction  are  those  of  Svlneford  (16)  aad 
Lorge  (8). 


Svlneford  (16)  found  evidence  for  the  existence  of  "tendency  to 
gamble"  as  a stable  factor.  Thle  gambling  tendency  is  measured  by  allowing 
subjects  to  assign  1,  2,  or  4 points  to  objective  test  Items  which  have 
right  and  vrong  answers.  This  technique  is  an  indirect  method  of  assessing 
the  certainty  vhlch  a subject  ascribes  to  his  answers.  The  intercorrelation 
of  0 (gambling)  scores  from  four  different  tests  ranged  fro*  .201  to  .798, 
vlth  a multiple  R of  .847.  The  distributions  of  the  0 scores  were  positive- 
ly skewed;  aoae  of  them  approached  normality.  The  only  evidence  for  0 as  a 
"personality  trait"  was  that  it  vas  so  named,  and  v&s  not  correlated  with 
ability  factors. 

Smth  and  Tyler  (13)  found  that  the  tendency  to  use  intermediate 
rather  than  more  extreme  scale  positions  could  be  used  as  a reliable  Index 
of  students’  behavior  with  respect  to  their  "caution"  In  drawing  conclusions. 
They  reported  a test-test  reliability  coefficient  of  .83  for  the  caution 
factor. 


Rundquist  (13)  considered  the  possible  meaningfulness  of  the 
response  set  termed  "the  tendency  to  take  extreme  scale  positions."  He 
tested  111  factory  girls  with  separate  series  of  personality  and  Interest 
Iteas.  He  found  that  the  tendency  to  take  extreme  scale  positions  corre- 
lated .40  betveen  Interest  and  personality  items.  In  view  of  this  relative- 
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ly  low  reliability,  be  felt  that  the  response  set  reflected  situational 
factors  rather  than  anything  basic  to  personality.  With  regard  to  the  type 
of  personality  and  interest  items  used,  he  regarded  the  elimination  of 
response  set  as  more  profitable  than  attempting  to  measure  it.  The  fact 
that  Rundquist  correlated  the  total  for  both  extremes  of  the  scale  on  one 
test  with  the  total  for  both  extremes  on  the  other  may  have  suppressed  re- 
liability. There  is  reason  to  believe  that  a separate  response  set  operates 
for  each  extreme  of  a given  scale. 

The  research  reviewed  in  this  section  indicated  the  existence  of 
various  types  of  response  sets.  However,  the  question  of  what  to  do  about 
a response  set  once  it  has  been  found  remains  unanswered.  Sene  authors 
(3,7,12)  have  shown  that  in  certain  test  situations  response  sets  adversely 
affect  reliability  and  validity.  Other  investigators  (15,16)  working  with 
different  tests,  have  indicated  that  response  sets  are  valuable  indices  of 
certain  aspects  of  personality.  Apparently,  the  dissimilarity  of  these 
findings  stem  more  from  the  differences  in  interests  and  purposes  of  the 
investigators  than  anything  .else . It  appears  that  the  question  of  what  to 
do  with  a response  set  may  have  to  be  answered  separately  for  each  psycho- 
logical test.  Whenever  am  existing  test  is  shown  to  yield  a stable  measure 
of  a response  set,  the  problem  of  dealing  with  it  must  be  resolved  by 
weighing  the  effects  on  reliability  and  validity  on  the  one  hand  and  the 
intrinsic  value  of  the  response  set  measure  as  am  index  of  personality  on 
the  other. 


II.  STATBORT  OP  FWESKRT  PROBLIM 


This  investigation  is  concerned  primarily  with  a study  of  the 
relationships  of  the  middle -category  ("?")  respoase  set  to  certain  person- 
ality variables.  At  the  same  time  this  research  throws  light  on  the 
question  of  the  existence  and  stability  of  this  particular  response  set. 

In  particular,  the  following  hypotheses  will  be  tested: 

(1)  That  there  is  a response  set  which  predisposes 
some  individuals  to  give  a greater  number  of 
question  mark  responses  than  do  others. 

(2)  That  there  are  significant  correlations  between 
the  relative  number  of  question  mark  responses 
which  individuals  give  on  one  test  and  the  number 
which  they  give  on  other  tests. 

(3)  That  a disposition  toward  giving  question  mark 
responses  may  be  shown  to  be  related  to  certain 
dimensions  of  personality. 
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III.  METHOD 


Guilford -Mart In  and  Bernreuter  personality  inventory  scores  for 
several  hundred  Havel  Aviation  Cadets  at  Pensacola,  Florida  vere  available 
for  study  by  virtue  of  their  use  in  another  research  project.*  The  Ouilford 
Martin  inventories  used,  vlth  the  definition  of  personality  trait  scores 
included: 

1.  An  Inventory  of  Factors  STDCR  (Test  STDCR) 

a.  8 _____  Social  introversion  - ertraverslon 

b.  T _____  Thinking  introversion  - extraversion 

c.  D Depression 

d.  C Cycloid  disposition 

e.  R _____  Rhathymia 

2.  The  Guilford -Mart  in  Inventory  of  Factors  OAMIH  (Test  GAKDf) 

a.  G _____  General  pressure  for  overt  activity 

b.  A Ascendancy 

c.  M Masculinity 

d.  I Lack  of  inferiority  feelings 

e.  If Lack  of  nervous  tenseness 

3.  Guilford -Mart in  Personnel  Inventory  (Test  OAgCo) 

a.  0 Objectivity 

b.  Ag Agreeableness 

c.  Co Cooperativeness 

Tbs  papers  for  these  cadets  vere  scored  by  keying  the  "T"  response  vitb  a 
velght  of  one.  Three  "t"  scores  are  thus  obtained  for  the  3^  cadets  on  the 
three  tests  of  the  Guilford -Mart  in  series  and  one  score  for  277  of  these 
subjects  on  the  Bemreuter.  These  scores  represent  the  sunber  of  "T" 
responses  given  by  a subject  on  each  of  the  four  personality  tests  adminis- 
tered. These  data  along  vlth  ACE  Quantitative,  Linguistic,  and  Total  Scores 
educational  level,  sad  the  thirteen  personality  trait  scores  derived  frost 
the  Guilford -Mart in  inventories  vere  punched  into  DM  cards. 


IV.  RESULTS 

1.  The  assn  nuaber  of  "I"  responses  is  equal  to  about  eight 
percent  of  the  nuaber  of  personality  items.  Variability  of  "T"  scores  Is 
considerable.  (See  Table  I). 

2.  The  distribution  of  "?"  scores  approximates  a J -curve.  (See 

Table  II). 

* T&"awtSofs**axe” grateTuT  5o~Drs7  licharJ  frasEuIl~sal“JohK  KanholJ  For  “ ' 

— these  date  evallsble. 
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3.  The  tendency  to  use  the  T category  la  a reliable  trait  aa 
above  by  the  iatercorrelationa  for  "t*  acorea  free  various  personality  teat  a. 
(See  Table  III). 

4.  Question  nark  acorea  do  mot  correlate  significantly  with 

acbolaatlc  aptitude  and  educational  level  attained  within  the  reatricted 
range  of  acholaatic  aptitude  atudied.  Berareuter  and  Ouilford -Martin  "l* 
acorea  correlate  significantly,  and  about  equally,  with  tea  Ouilford  Martin 
personality  traits:  CAMTH,  OAgCo,  and  TR  of  Test  8TDCR.  (See  Table  IT). 


y.  Discussioi  o r hesultb 

A.  Reliability  and  Related  Statistics 

The  variability  and  the  nature  of  the  distribution  are  among  the 
first  problems  to  consider  in  describing  a trait.  Thus,  the  extent  to  which 
the  ”?"  category  is  used,  and  the  nature  of  the  distribution  of  "T”  scores 
were  presented  first  in  Tables  1 and  II.  >roa  these  data,  it  is  noted  that 
the  mean  number  of  "7”  responses'  for  this  group  van  about  equal  to  8jt  of 
the  nunber  of  items  of  the  Ouilford  •Martin  and  Berareuter  tests.  Further- 
more, subjects  differ  considerably  in  the  extent  to  vhich  this  category  is 
used,  as  shown  by  the  standard  deviations  and  frequency  distribution 
presented.  This  distribution  is  positively  skewed;  the  shape  approximating 
closely  the  J-curve,  half  of  a normal  distribution. 

Having  established  that  the  variability  In  "7"  scores  is  consider- 
able, the  problem  of  reliability  arises.  With  "7"  scores  from  four  separate 
tests  to  be  correlated  with  many  other  variables,  it  becomes  a practical  as 
veil  as  theoretical  issue  to  consider  the  best  single  composite  score  that 
can  be  derived  from  the  many  possible  combinations  of  scores.  Urn  corre- 
lations reported  in  Table  III  were  calculated  with  this  in  mind.  The  blank 
cells  in  the  matrix  involve  correlations  of  a single  test  score  with  a sum 
which  would  be  based  in  part  on  itself.  This  would  result  in  reliabilities 
spuriously  high,  and  make  their  interpretation  difficult. 

Certain  relationships  are  immediately  apparent  from  this  table. 

The  intereorrelatlona  between  the  three  Ouilford -Mart in  teats  are  slightly 
higher  than  between  the  Berareuter  and  Guilford -Mart  in  tests.  This  dis- 
crepancy could  arise  from  many  factors,  but  one  of  the  first  to  consider  is 
the  difference  between  the  number  of  items  in  the  Berareuter  sad  the 
Guilford -Mart  in  inventories.  The  former  consists  of  123  items  and  the 
latter  of  5H*  We  can  approximate  the  number  of  items  on  each  of  the 
GulLford-Martin  tests  as  173.  Applying  the  Spearman-Brown  prophecy  formula 
to  the  123  item  Berareuter  test,  whose  reliability  is  estimated  as  .64,  the 
increased  length  to  173  items  yields  a calculated  reliability  of  .71.  This 
value  approximates  closely  the  observed  reliability  of  .72  estimated  for  the 
Ouilford -Mart in  tests. 

Many  factors  could  account  for  the  observed  correlations,  but 
other  things  being  equal,  the  most  important  factor  Influencing  reliability 
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is  the  sampling  of  items.  Tran  the  above  results,  Bemreuter  "T”  scores 
would  seem  to  have  as  much  in  common  with  Guilford -Martin  scores  as  the 
latter  have  with  each  other.  This  suggests  that  differences  in  item  content 
between  the  Bernreuter  and  Guilford -Mart in  are  equally  unimportant  in  the 
production  of  "T"  responses. 

reliability  for  the  total  "?"  score  obtained  from  the  three 
Guilford -Martin  inventories  was  desired  since  these  tests  are  coemcnly 
administered  together.  The  reliability  for  a single  Guilford -Mart  In  test 
can  be  estimated  from  the  Intercorrelations  between  the  three  Inventories. 
The  Spearman -Brows  formula  for  tripling  the  length  of  a test  is  then  applied 
to  this  reliability  estimate  for  a single  inventory.  This  value  calculated 
from  the  Spearman-Brown  is  a maximum  estimate  for  the  reliability  of  the 
total  "?"  score  from  the  Guilford -Mart in  inventories. 

The  reliability  for  a single  Guilford -Mart In  test  Is  estimated  as 
.72,  the  median  value  for  the  in  ter  correlations  (.70,  .72,  and  .76)  of  the 
three  Guilford -Mart in  tests.  The  calculated  value  from  the  Spearman-Brown 
formula  is  .89,  an  estimate  of  the  maximum  reliability  when  a test  has  been 
tripled  in  length  by  the  addition  of  comparable  items. 

A minimum  estimate  of  the  reliability  for  total  "?"  scores  is  ob- 
tained by  considering  the  correlation  between  "?"  scores  lor  a single 
Guilford -Mart  in  test  and  the  sum  of  the  remaining  two  inventories.  This 
reliability  estimate  is  less  than  that  for  tripling  the  length  of  a test. 

The  estimates  found  in  Table  III  are  .76,  .78,  and  .80;  the  median  value  of 
.78  is  considered  as  the  best  minimum  reliability  estimate. 

from  the  maximum  and  minimum  estimates  obtained,  it  is  concluded 
that  the  reliability  for  total  scores  from  the  Guilford -Mart in  is  in  the 
lov  eighties. 


B.  Correlations  With  Other  Variables 

The  correlations  reported  in  Table  III  indicate  that  a stable  trait 
is  being  measured  for  the  samples  of  behavior  observed.  These  results  are 
in  keeping  with  previous  research  in  this  area.  An  important  hypothesis 
remains  to  be  tested.  That  is,  are  there  significant  relationships  between 
"?"  scores  and  other  variables,  particularly  those  in  the  personality  realm? 
It  was  postulated  that  the  tendency  to  use  the  category  is  s response 
set  indicative  of  personality  trends. 

There  remains  to  be  demonstrated  that  significant  correlations 
exist  between  "?"  scores  and  personality  variables.  The  only  data  available 
at  this  writing  In  the  personality  area  include  scores  on  personality  traits 
from  the  Quilf ord -Mart in  itself.  One  other  hypothesis  for  which  data  were 
available  concerns  the  relationship  between  scholastic  aptitude,  educational 
level,  and  "T"  scores.  Table  IV  lists  tbs  correlations  for  these  variables. 
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Because  personality  trait  scores  from  the  Ouilford-Martin  nay  be 
contaminated  by  the  number  of  "7"  responses,  separate  correlations  were 
computed  for  Berareuter  "7"  scores.  The  sun  of  all  "?"  responses  for  the 
four  tests  were  also  correlated  with  Guilford -Mart in  trait  variables  to 
note  any  increase  iu  correlation  due  to  increased  reliability  by  virtue  of 
increased  length.  The  issue  of  contamination  of  Guilford -Mart In  personality 
trait  scores  by  the  "7"  category  will  be  discussed  at  greater  length  short- 
ly. 


It  will  be  noted  in  Table  IV  that  scholastic  aptitude  (&a  Measured 
by  the  ACE)  and  educational  level  (nunber  of  years  schooling  completed)  do 
not  correlate  significantly  with  the  "7"  variable.  This  result  is  consist- 
ent with  the  findings  of  Swineford  (l6),  referred  to  earlier.  The  inde- 
pendence of  "?"  scores  from  variables  of  education  scholastic  aptitude 
is  a very  useful  property  for  predictive  problems  involving  multiple  corre- 
lation. This  implies  that  significant  correlations  may  be  fouai  between 
"T"  scores  and  other  predictors  of  the  criterion  investigated. 

In  the  personality  realm,  significant  correlations  are  noted 
between  "7"  scores  and  all  Guilford -Martin  trait  scores  from  Test  GAMIN  and 
OAgCo.  In  addition,  traits  T and  R from  test  STDCR  correlate  significantly 
with  the  "7"  variables.  No  great  change  in  the  magnitude  of  these  corre- 
lations is  noted  when  either  the  total  Guilford -Mart in  "7"  scores  are  used, 
or  the  Bernreuter,  or  the  sum  of  both.  There  is  some  tendency  for  the  total 
sum  to  correlate  higher,  but  the  differences  noted  would  not  seem  to  warrant 
the  additional  testing  time  required. 

Ob  the  basis  of  these  correlation*,  it  might  be  concluded  that 
individuals  who  tend  to  respond  with  many  ”7”  responses  tend  to  be  less  ob- 
jective (0),  agreeable  (Ag),  cooperative  (Co),  active  (G),  ascendant  (A), 
masculine  (M),  self-confident  (I),  free  from  neurotic  tendencies  (N),  re- 
flective (T),  and  Impulsive  (R). 

It  is  of  interest  to  note  that  the  three  different  "7"  scores  used 
to  correlate  with  Guilford -Mart in  trait  scores  yield  comparable  results. 
There  is  generally  some  decrease  in  the  size  of  these  correlations  for  the 
Bernreuter.  However,  the  same  traits  are  revealed  as  significant.  This  is 
of  practical  importance  since  testing  time  available  could  become  crucial 
in  other  studies  involving  the  "7"  variable.  It  would  appear  that  for  the 
purpose  of  demonstrating  the  existence  of  correlation  between  ”7"  scores  ami 
other  variables,  as  well  as  a general  indication  of  the  strength  of  the 
relationship,  a single  personality  inventory  is  almost  as  useful  as  three. 

The  finding  that  "7"  scores  correlate  negatively  with  Guilford- 
Martln  trait  scores  is  consistent  with  the  expected  personality  pattern  of 
subjects  tending  to  respond  with  many  "7’s.n  Nevertheless,  these  results 
cannot  be  taken  at  face  value.  All  Guilford -Mart in  trait  scores  are  influ- 
enced by  how  many  ”?n  responses  are  selected  by  a subject.  If  all  items  of 
the  Guilford -Mart  in  are  marked  " the  results  indicated  in  Table  V are 
obtained . ” 
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la  tbe  cm*  of  Tacts  GAMH  aad  OAgCo,  subjects  Bust  respond  either 
"yes"  or  "no"  to  receive  points  toward  the  traits  being  measured.  This  is 
also  true  for  traits  T and  R of  TMt  STOCK.  The  nore  "?"  responses  a sub- 
ject checks  the  more  likely  he  will  receive  a low  rav  score  on  all  of  the 
above  traits.  In  the  case  of  traits  S,  D,  and  C of  Test  STDCK,  a subject 
nay  accumulate  points  toward  these  traits  by  marking  the  category.  For 
these  traits  two  subjects  nay  have  Identical  rav  scores,  one  by  virtue  of 
nany  "7"  responses,  tbe  other  by  "yes"  and  "no"  responses  which  counteract 
each  other.  Thus,  the  correlations  actually  observed  are  explicable  in 
view  of  these  considerations.  It  will  be  recalled  that  negative  corre- 
lations were  obtained  for  trait  scores  where  the  "?"  response  was  not 
scored,  and  chance  correlations  for  traits  (S,  D,  and  C)  where  the  "f" 
response  was  an  Important  factor  in  tbe  trait  scores.  It  is  these  facts 
that  make  difficult  the  interpretation  of  the  significant  correlations  ob- 
tained between  personality  traits  and  responses. 

Bernreuter  "?"  scores  and  Guilford -Mart  in  trait  scores  are  inde- 
pendent measures  in  the  sense  that  Bernreuter  "7"  scores  do  not  enter  into 
the  score  for  Guilford -Martin  traits.  It  appeared  at  the  outset  of  this 
study  that  the  correlations  for  the  Bernreuter  would  be  of  crucial  Importance. 
However,  the  Bernreuter  correlates  .72  with  the  total  Guilford -Mart in  "7" 
score.  The  correlation  of  .72  suggests  that  Bernreuter  "7"  scores  would  be 
expected  to  correlate  with  the  same  variables  as  do  Guilford -Mart  in  "7" 
scores. 


THEORETICAL  IMPLICATIONS 

One  channel  of  thinking  about  tbe  "7"  response  emerges  from  the 
j-curve  found  for  the  distribution  of  scores.  In  social  psychology  cwrves 
of  this  type  have  been  described  by  Allport  (l)  as  conformity  curves. 

Social  situations  which  demand  conformity  yield  distributions  of  scores 
when  measured  which  are  highly  skewed,  the  modal  behavior  approaching  the 
cultural  norm.  The  classic  example  cited  is  the  behavior  of  motorists  at 
an  intersection  vho  are  confronted  by  either  red  lights,  stop  signs,  or  a 
traffic  officer.  Most  motorists  in  such  situations  will  stop  casqpletely, 
some  will  go  very  slow  or  slightly  slow,  and  a few  will  not  reduce  their 
speed  at  all. 

By  analogy,  behavior  in  answering  personality  items  could  bs  a 
reflection  of  cultural  conformity.  The  subject  who  responds  with  very  many 
"7"  responses  may  be  going  through  the  stop  sign,  so  to  speak.  In  effect, 
he  may  be  avoiding  the  test  through  the  refuge  of  the  "7”  response.  From 
this  analysis,  each  personality  item  may  be  a reflection  of  a cultural  norm. 
Hie  major  difference  between  behavior  in  the  stop  sign  situation  and  toward 
personality  items  is  that  behavior  in  the  first  case  is  defined  explicitly, 
in  the  second  Implicitly. 

He  could  define  operationally  tbe  cultural  norm  for  aay  item  M 
the  proportion  of  people  responding  in  tbs  najority  direction  of  either  "ye*" 
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or  "ao."  Where  the  Majority  for  "yes”  or  "bo”  reaponees  is  very  decisive, 
these  segments  of  behavior  represent  cultural  norms  clearly  crystallized 
for  that  group.  The  psychological  pattern  inferred  fro*  the  content  of  such 
items  might  be  Interpreted  as  the  cultural  definition  of  adjustment.  If 
such  a framework  can  be  formed,  a large  "7"  score.  Ipso  facto,  represent 
non-adjustment  according  to  cultural  conformity. 

Although  the  suggested  approach  outlined  above  may  be  worthwhile, 
the  contamination  of  personality  trait  scores  with  the  "T*  response  makes 
necessary  the  adoption  of  completely  Independent  criteria  to  which  ”T" 
scores  may  be  correlated.  Preliminary  analysis  by  the  writers  show,  far 
example,  that  "T"  scores  are  not  related  to  "buddy  ratings,"  scores  obtained 
by  peer  nominations  for  leadership  qualities,  further  work  along  this  line 
will  be  reported  later.  What  Is  needed  are  many  such  independent  measures 
which  seen  to  be  based  in  great  part  upon  personality  factors. 

Guilford  himself  must  recognize  the  problem  posed  by  the  "t”  cate- 
gory. In  the  Qullf ord-ZI masrman  personality  test,  a revision  of  the 
Guilford -Mart In,  the  directions  now  stress  that  subjects  should  avoid  using 
"7"  responses  unless  absolutely  necessary. 

It  may  turn  out  that  the  "7"  response  in  itself  Is  not  a signifi- 
cant variable.  However,  more  fruitful  results  may  be  obtained  from  standard 
personality  tests  such  as  the  Guilford -Mart la  when  trait  scores  are  correct- 
ed for  the  number  of  ”7"  responses.  Thurstons  (17)  has  already  recommended 
Just  such  a procedure  in  hie  manual  for  the  Thur stone  Temperament  Schedule. 
He  calls  for  the  two  new  types  of  scores:  (l)  the  number  of  "T"  responses 

made  on  the  Temperament  Schedule  and  (2)  a score  for  each  personality  area 
which  la  twice  the  number  of  correct  response!  plus  the  number  of  "7" 
responses.  Thur stone  calls  these  "experimental  Uncertainty  Scares,"  and 
postulates  that  the  first  of  these  nay  Indicate  lack  of  self-confidence,  or 
insecurity  In  self  appraisal.  The  second  score  makes  an  adjustment  In  trait 
■cores  so  as  to  differentiate  between  subjects  who  otherwise  might  receive 
the  same  raw  score. 

It  would  seem  worthwhile  when  using  objective  personality  tests  as 
predictors  to  develop  three  scores.  The  first  would  be  the  usual  trait 
scores  derived  from  the  test.  The  second  would  be  the  inmtoer  of  "7"  re- 
sponses. The  third  would  be  the  trait  scores  corrected  for  the  number  of 
"7"  responses  in  each  trait.  The  adjustment  made  could  be  the  one  recommend 
ed  by  Thurstons,  or  a similar  correction. 

There  remains  one  further  Issue  to  explore.  In  considering  the 
three  responses  to  personality  items,  there  are  several  assumptions  which 
can  be  made  with  respect  to  the  underlying  continuum.  The  usual  one  made  le 
that  the  "7"  response  lies  between  "yes”  and  "no”  on  a continuum  of  Judgment 
For  the  purpose  of  exposition,  let  us  consider  one  item  chosen  arbitrarily 
from  the  Guilford -Mart In  inventories : "Do  you  like  to  speak  in  public?” 

Under  the  above  assumption,  the  continuum  implied  Is  the  Judgment  by  a sub- 
ject with  regard  to  his  llklng-to-speak- in-public.  If  his  Judgment  Is  that 
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this  behavioral  gestalt  is  typical  of  him,  he  responds  "yes.*  If  it  * is  not, 
he  responds  "no."  By  ansver lag  with  the  "T"  response,  he  implies  til-tit  his 
judgment  Is  uncertain.  The  crucial  feature  of  this  kind  of  coat  Inns  s is 
that  subject . should  perceive  the  item  content  similarly. 

Another  possible  assumption  is  that  "yes'1  sal  'no*  repreictM. 
opposite  poles  of  a continuum,  ’.bile  the  "?"  is  not  os  this  continue  . at 
all.  Stated  another  way,  the  "?"  category  is  qualitatively  differs  from 
the  other  two.  This  assumption  implies  ti^t  who  respond  7 to  an 

itea  are  not  all  reacting  alike  to  the  <restalt-like  fueling  aseocisttdbd  with 
the  itea  content.  Soae  of  then  nay  respond  "T"  because  the  itea  doti  not 
apply  to  them,  they  do  not  understand  its  meaning,  or  they  agree  tooime 
part  (like  to  speak)  but  disagree  with  another  part  (in  public). 

Similar  reasoning  say  hold  with  respect  to  "yes"  and  "no"r*ee- 
spoases ) that  Is,  they  too  may  represent  qualitatively  different  retpoonsas . 
Other  research  has  presented  evidence  for  the  existence  of  response  m sets 
for  these  categories.  Since  these  types  of  response  sets  have  not  tan 
explored  in  this  paper,  crucial  data  are  lacking  for  the  point  in  qu  estion. 

It  is  not  feasible  to  record  the  thought  processes  of  subjes-ote  as 
they  respond  to  personality  items.  Yet,  the  distribution  of  scores fo;or 
each  of  the  response  categories  may  make  possible  Inferences  about  tjft  a 
assumptions  in  question.  It  has  been  reported  here  that  the  district: cion 
of  scores  for  the  "T"  response  is  highly  skewed,  approximating  the  J.«r=urve. 
further  research  comparing  the  reliability  and  distribution  of  scom  t for 
"yes"  and  "no"  response  sets  to  those  for  "?"  may  shed  light  on  thitls-ssue. 

In  the  first  portion  of  the  theoretical  implications,  we  hvwe  dis- 
cussed a method  of  exploring  the  meaning  of  item  responses  in  their  pre-esent 
form.  This  last  section  suggests  that  the  resujts  of  considering  th 
different  assumptions  underlying  the  response  continue  may  force  a iwfc'ision 
in  the  type  of  responses  available  to  the  subjects.  Ons  such  revisly 
possible  is  the  abandoning  of  the  middle  category.  An  alternative  litd"to 
explore  the  meaning  of  "?"  scores  when  this  category  is  retained. 

further  work  is  being  dose  on  the  relationship  of  middle  citnmmgory 
scores  to  Independent  measures  of  personality.  These  results  will  hrmre- 
ported  later. 


rfl.  COACLUSIOHS 

With  respect  to  the  hypotheses  formulated  is  the  statement  tf 
problem,  it  may  be  concluded  from  the  results  presented  here  that: 

1.  There  is  a response  set  which  predisposes  some  individuim— « 
to  respond  with  a greater  number  of  responses  than  lo  • 
others,  furthermore,  the  distribution  of  such  scores 
approximates  closely  a J -curve. 
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2.  The  reliability  of  "?"  score#  as  shown  by  the  corre- 
lations between  several  tests  indicates  a stable  trait 
is  being  measured. 

3.  Question  mark  scores  are: 

a.  Independent  of  scholastic  aptitude  (ACE)  and  edu- 
cational level  attained. 

b.  Related  to  Guilford-Martin  personality  trait  scores 
GAMHI,  OAgCo,  and  TR  of  Test  STDCR.  The  evidence 
from  this  study  indicates  that  the  significant 
correlations  found  are  spurious  by  virtue  of  the 
procedures  followed  in  scoring  responses  on 
the  Guilford -Mart in  inventories. 

4.  It  is  also  concluded  that  personality  trait  scores  derived 
from  objective  personality  tests  should  be  adjusted  by 
trait  for  the  number  of  "t"  responses.  The  importance 

of  this  adjustment  will  be  a function  of  the  number  of 
times  the  "?"  category  is  used  and  the  scoring  procedures 
followed  with  respect  to  this  category. 
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TAILS  I 


MEANS,  STANDARD  EEVIATI0M8,  AID 
FSRCBNTA/ESS  OF  QUESTION  MARK  SCORNS  OK  FOUR  PERSONALITY  TESTS 


Test 

No.  of 

Itaas 

No.  of 
Subjects 

Bo.  of 

Standard 

Deviation 

$ at  No. 
of  I teas 

Guilford  -Mart in  (GAMIN) 

186 

344 

13.30 

13.18 

7.15 

Guilford -Mart  in  (STDCR) 

175 

344 

13.21 

13.84 

7.55 

Guilford -Mart  in  (OAgCo) 

150 

344 

11.60 

11.33 

7.73 

Berareuter 

125 

277 

12.33 

11.46 

9.86 

Guilford -Mart in 

Sum  (1,  2,  and  3 above) 

511 

337* 

36.38 

35.26 

7.12 

Bernreuter  and  Guilford- 
Martin  Sub  (4  and  5 
above) 

636 

273* 

49.61 

44.93 

7.80 

*For  IBM  analyses, 
regaining  tests. 


subjects  were  eliminated  with  iaccaaplete  data  on  the 


TABU  n 


FRKQUUCY  DISWIBOnOI  FOR  SUM  CP  "t  V 
01  1SRXK  QUUFGRD-MARTH  TESTS 

(■  - 337) 


Question  Mark  Score 

Frequency 

0-9 

102 

10-19 

41 

20-29 

39 

30-39 

35 

40-49 

29 

50-59 

27 

60-69 

18 

70-79 

7 

80-89 

10 

90-99 

10 

100-109 

5 

110-119 

2 

120-129 

1 

130-139 

5 

1UO-149 

3 

150-159 

0 

160-169 

0 

170-179 

l 

180-189 

0 

190-199 

0 

200-209 

2 

TABLI  III 


IRTXRCQRRXLATIOK  MHBI  IWBIR  GP  QUESTIOI  MARK  RB8P0BSS  FOR 
OOIUPORD-MARTII  AID  HBURXIFEBR  PSRSOIALITY  IRVKfPORHS* 

(»  - 277) 


1. 

2. 

3. 

4. 

5. 

6.  7. 

1. 

Berareuter 

— 

2. 

OAgCo 

.6k 

«»•»« 

3. 

OAMIir 

.6k 

.76 

— 

k. 

STOCK 

.67 

.70 

.72 

— 

5. 

OAgCo  and 
QAMIR 

.68 

... 

... 

.76 

mmm 

6. 

OAgCo  and 
STOCK 

• 72 

... 

.80 

... 

... 

... 

7. 

QAMIR  and 

STOCK 

.70 

.78 

mm  m • 

• e»«» 

— . ... 

8. 

Total  Guilford- 
Martin  Sun 

• 72 

.89 

.91 

• 91 

"Blanks  represent  correlations  at  a single  score  with  a sun  based  in  part  on 
that  score.  Such  correlations  are  spuriously  high. 


TABU  IV 


L 


CORRELATIONS  BETVEEX  QOS8TION  MARK  SCORES  AMD 
OUmORD-MARTIR  PERSOHALITY  WAIT  SCORES,  ACE,  AND  EDUCATION 

Sum  of  Guilford- 


Total  Guilford - 
Martin  "?V 
(I  - 337) 

Bemreuter  "t’b" 
(H  - 273) 

Martin  and  Bern' 
reuter  "?*•" 

Ol  - 273) 

1. 

ACE-Q 

.04 

-.07 

.00 

2. 

ACE-L 

.03 

.00 

.03 

3. 

AGE -Total 

.04 

-.02 

.03 

4. 

Education 

.02 

- . 01 

.01 

5. 

S 

.04 

.05 

.09 

6. 

T 

-.16 

-.22 

-.18 

7. 

D 

-.01 

-.05 

.03 

8. 

C 

-.06 

-.10 

1 

• 

O 

9- 

R 

-.39 

-.33 

-.42 

10. 

0 

-.30 

-.17 

-.30 

11. 

Co 

-.26 

-.17 

-.24 

12. 

A« 

-.24 

-15 

-.26 

13. 

0 

-.29 

-.26 

-.30 

14. 

A 

-.22 

-.15 

-.22 

15. 

M 

-.36 

-23 

-35 

16. 

I 

-.30 

-19 

-.30 

17. 

V 

-.22 

-.12 

-.23 

TABLE  V 


J 

i 


I 


BOV  "l"  RE8P0B8ES  DEER  HTO  TRAIT  SCORES 


Total  Bo. 

Bo.  I tens  Scored 

C-Score 

Test 

Itens 

For  By  Trait 

Obtained 

1. 

3TDGR 

175 

S-27;  T-0;  D-17; 

8-3;  T-10;  D-6 

C-lfc;  R-0 

C-8;  R-0 

2. 

OAgCo 

150 

0-3;  A*-5;  Co-6 

All  C More* 
are  0 

3. 

QAMU 

186 

Bom 

All  C scores 
are  0 

1 


